-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add module for loading test data. #120
Conversation
There are a few tests I'll need to change in |
65370a9
to
a37f75e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful PR! Well written and great use of libraries. I have a few suggested changes. The most important one being not having a global s3
client in the load
module. Details in-line. Looking forward to using this myself soon!
sub-packages/bionemo-testing/src/bionemo/testing/data/resource.py
Outdated
Show resolved
Hide resolved
a5f51af
to
c6eab45
Compare
/build-ci |
sub-packages/bionemo-testing/src/bionemo/testing/data/resource.py
Outdated
Show resolved
Hide resolved
sub-packages/bionemo-testing/src/bionemo/testing/data/resource.py
Outdated
Show resolved
Hide resolved
sub-packages/bionemo-testing/src/bionemo/testing/data/resource.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock since we're on different coasts and I don't want to hold up your PR as you work through it :) But I would still like engagement on the review comments.
Overall, this is a well-written and clear PR 🚀
ce4c68a
to
433e520
Compare
bbb4441
to
1f5f63a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change, seems like it will be really helpful. Could we add a README or documentation somewhere on how we are expected to interact with this?
As a first time user this is my surface expectations from looking at this code:
load
is the setup method I'll want to call for whatever resource I need - should be toplevel in my test or a fixture- Need to add a new yaml entry for whatever resource I want included.
Is that right? I'm guessing it will be less clear when the diff isnt right infront of me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
1f5f63a
to
853be25
Compare
Offloads much of the heavy lifting to pooch, but implements a system for specifying and downloading files to a local cache that will persist between devcontainer builds. Signed-off-by: Peter St. John <[email protected]>
4927d14
to
c92661f
Compare
Implements a system for specifying remote files in a more flexible fashion than the previous
download_artifacts
script.This lets us write yaml files in
sub-packages/bionemo-testing/src/bionemo/testing/data/resources/
(for instance,geneformer.yaml
below) that specifies different remote data sources:Then calling
load(filename/tag)
returns a cached directory with the downloaded file(s). Compressed files are automatically unzipped, tar files are automatically unpacked (you get back the folder root of the tarfile).This should make it simpler to add new data for examples, tests, etc., and no longer requires users to run
download_artifacts
upfront before running the test suite.The implementation offloads much of the heavy lifting to pooch. It also adds the default .cache folder location to the devcontainer's folder mounts, so users should be able to have these files persist across container runs.
NGC downloads
This now supports downloading from NGC as well! You need to specify both the NGC tag and the registry (i.e.,
model
orresource
). See this entry for ESM2: